Construct Validity of e-rater® in Scoring TOEFL® Essays

نویسنده

  • Yigal Attali
چکیده

This study examined the construct validity of the e-rater automated essay scoring engine as an alternative to human scoring in the context of TOEFL essay writing. Analyses were based on a sample of students who repeated the TOEFL within a short time period. Two e-rater scores were investigated in this study, the first based on optimally predicting the human essay score and the second based on equal weights for the different features of e-rater. Within a multitrait-multimethod approach, the correlations and reliabilities of human and e-rater scores were analyzed together with TOEFL subscores (structured writing, reading, and listening) and with essay length. Possible biases between human and e-rater scores were examined with respect to differences in performance across countries of origin and differences in difficulty across prompts. Finally, a factor analysis was conducted on the e-rater features to investigate the interpretability of their internal structure and determine which of the two e-rater scores reflects this structure more closely. Results showed that the e-rater score based on optimally predicting the human score measures essentially the same construct as human-based essay scores with significantly higher reliability and consequently higher correlations with related language scores. The equal-weights e-rater score showed the same high reliability but significantly lower correlation with essay length. It is also aligned with the 3-factor hierarchical (word use, grammar, and discourse) structure that was discovered in the factor analysis. Both e-rater scores also successfully replicate human score differences between countries and prompts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analytic Scoring of TOEFL® CBT Essays: Scores From Humans and E-rater

The main purpose of the study was to investigate the distinctness and reliability of analytic (or multitrait) rating dimensions and their relationships to holistic scores and e-rater essay feature variables in the context of the TOEFL computer-based test (CBT) writing assessment. Data analyzed in the study were analytic and holistic essay scores provided by human raters and essay feature variab...

متن کامل

A Differential Word Use Measure for Content Analysis in Automated Essay Scoring

As part of its nonprofit mission, ETS conducts and disseminates the results of research to advance quality and equity in education and assessment for the benefit of ETS's constituents and the field. To obtain a PDF or a print copy of a report, please visit: Abstract This paper proposes an alternative content measure for essay scoring, based on the difference in the relative frequency of a word ...

متن کامل

Stumping e-rater: challenging the validity of automated essay scoring

This report presents the findings of a research project funded by and carried out under the auspices of the Graduate Record Examinations Board Researchers are encouraged to express freely their professional judgment. Therefore, points of view or opinions stated in Graduate Record Examinations Board Reports do not necessarily represent official Graduate Record Examinations Board position or poli...

متن کامل

Automated Essay Scoring with the E-rater System

This paper provides an overview of e-rater®, a state-of-the-art automated essay scoring system developed at the Educational Testing Service (ETS). E-rater is used as part of the operational scoring of two high-stakes graduate admissions programs: the GRE® General Test and the TOEFL iBT® assessments. E-rater is also used to provide score reporting and diagnostic feedback in Criterion SM , ETS’s ...

متن کامل

Automated Evaluation of Essays and Short Answers

Essay questions designed to measure writing ability, along with open-ended questions requiring short answers, are highly-valued components of effective assessment programs, but the expense and logistics of scoring them reliably often present a barrier to their use. Extensive research and development efforts at Educational Testing Service (ETS) over the past several years (see http://www.ets.org...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007